Enhancing Text Document Clustering Using Non-negative Matrix Factorization and WordNet

نویسندگان

Chul-Won Kim

Sun Park

چکیده

A classic document clustering technique may incorrectly classify documents into different clusters when documents that should belong to the same cluster do not have any shared terms. Recently, to overcome this problem, internal and external knowledge-based approaches have been used for text document clustering. However, the clustering results of these approaches are influenced by the inherent structure and the topical composition of the documents. Further, the organization of knowledge into an ontology is expensive. In this paper, we propose a new enhanced text document clustering method using non-negative matrix factorization (NMF) and WordNet. The semantic terms extracted as cluster labels by NMF can represent the inherent structure of a document cluster well. The proposed method can also improve the quality of document clustering that uses cluster labels and term weights based on term mutual information of WordNet. The experimental results demonstrate that the proposed method achieves better performance than the other text clustering methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Clustering Using Term Weights and Class Label Terms Based on Semantic Features

Clustering of class labels can be generated automatically, which is much lower quality than labels specified by human. In this paper, we propose a new enhancing document clustering method using terms of class label and term weights. The terms of class label can well represent the inherent structure of document clusters by non-negative matrix factorization (NMF). It can also improve the quality ...

متن کامل

Text Clustering using Semantic Terms

In traditional text clustering, documents appear terms frequency without considering the semantic information of each document (i.e., vector model). The property of vector model may be incorrectly classified documents into different clusters when documents of same cluster lack the shared terms. Recently, to overcome this problem uses knowledge based approaches. However, these approaches have an...

متن کامل

Big Text Data Clustering using Class Labels and Semantic Feature Based on Hadoop of Cloud Computing

Clustering of class labels can be generated automatically, which is much lower quality than labels specified by human. If the class labels for clustering are provided, the clustering is more effective. In classic document clustering based on vector model, documents appear terms frequency without considering the semantic information of each document. The property of vector model may be incorrect...

متن کامل

A Novel Fast Non-negative Matrix Factorization Algorithm and Its Application in Text Clustering

In non-negative matrix factorization, it is difficult to find the optimal non-negative factor matrix in each iterative update. However, with the help of transformation matrix, it is able to derive the optimal non-negative factor matrix for the transformed cost function. Transformation matrix based nonnegative matrix factorization method is proposed and analyzed. It shows that this new method, w...

متن کامل

Document clustering using nonnegative matrix factorization

Amethodology for automatically identifying and clustering semantic features or topics in a heterogeneous text collection is presented. Textual data is encoded using a low rank nonnegative matrix factorization algorithm to retain natural data nonnegativity, thereby eliminating the need to use subtractive basis vector and encoding calculations present in other techniques such as principal compone...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

J. Inform. and Commun. Convergence Engineering

دوره 11 شماره

صفحات -

تاریخ انتشار 2013

Enhancing Text Document Clustering Using Non-negative Matrix Factorization and WordNet

نویسندگان

چکیده

منابع مشابه

Document Clustering Using Term Weights and Class Label Terms Based on Semantic Features

Text Clustering using Semantic Terms

Big Text Data Clustering using Class Labels and Semantic Feature Based on Hadoop of Cloud Computing

A Novel Fast Non-negative Matrix Factorization Algorithm and Its Application in Text Clustering

Document clustering using nonnegative matrix factorization

عنوان ژورنال:

اشتراک گذاری